Search CORE

1 research outputs found

Integrated information gain with extra tree algorithm for feature permission analysis in android malware classification

Author: Howida AbuBaker Al-Kaaf
Publication venue
Publication date: 01/01/2022
Field of study

The rapid growth of free applications in the android market has led to the fast spread of malware apps since users store their sensitive personal information on their mobile devices when using those apps. The permission mechanism is designed as a security layer to protect the android operating system by restricting access to local resources of the system at installation time and run time for updated versions of the android operating system. Even though permissions provide a secure layer to users, they can be exploited by attackers to threaten user privacy. Consequently, exploring the patterns of those permissions becomes necessary to find the relevant permission features that contribute to classifying android apps. However, with the era of big data and the rapid explosion of malware along with many unnecessary requested permissions, it has become a challenge to recognize the patterns of permissions from these data due to the irrelevant and redundant features that affect the classification performance and increase the complexity cost overhead. Ensemble-based Extra Tree - Feature Selection (FS-EX) algorithm was proposed in this study to explore the permission patterns by selecting a minimal-sized subset of highly discriminant permission features capable of discriminating against malware samples from nonmalware samples. The integrated Information Gain with Ensemble-based Extra Tree - Feature Selection (FS-IGEX) algorithm is proposed to assign weight values to permission features instead of binary values to determine the impact of weighted attribute variables on the classification performance. The two proposed methods based on Ensemble Extra Tree Feature Selection were evaluated on five datasets with various sample sizes and feature space using nine machine learning classifiers. Comparison studies were carried out between FS-EX subsets and the dataset of Full Permission features (FP) and the two approaches of the FS-IGEX method - the Permission-Binary (PB) approach and the Permission-Weighted (PW) approach. The permissions with PB were represented with binary values, whereas permissions with PW were represented with weighted values. The results demonstrated that the approach with the FS-EX was promising in obtaining the most prominent permission features related to the class target and attaining the same or close classification results in terms of accuracy with the highest accuracy mean of 96%, as compared to the FP. In addition, the PW approach of the FS-IGEX method had highly influential weighted permission features that could classify apps as malware and non-malware with the highest accuracy mean of 93%, compared to the PB approach of the FS-IGEX method and the FP

Universiti Teknologi Malaysia Institutional Repository